0.1 Introduction to Networks

So, before we talk about networks, one thing upfront… why should we? I mean, they undeniably look pretty, don’t they?

Somehow, the visualization of networks fascinates the human mind (find a short TED talk on networks and how they depict our world here), and has even inspired an own art movement, networkism (see some examples here).

Yet, besides that, is there an analytical value for a data scientist to bother about networks?

0.2 Networks in R

There are a number of applications designed for network analysis and the creation of network graphs such as gephi and cytoscape. Though not specifically designed for it, R has developed into a powerful tool for network analysis.

Significant network analysis packages for R include the network, sna, and igraph package. In addition, Thomas Lin Pedersen has recently released the tidygraph package that leverage the power of igraph in a manner consistent with the tidyverse workflow. Even better, he tops it up with ggraph, a consistent ´ggplot2´-look-and-feel network visualization package.

R can also be used to make interactive network graphs with the htmlwidgets framework that translates R code to JavaScript. Cool implementations thereof are the vizNetwork and networkD3 packages.

As analytical tool, I will in this lab mostly use igraph. In terms of functions, it is pretty much equivalent to network, yet slightly more powerful, better integrated, and maintained. Since both packages have many of the same functions, better don’t load them both at once.

0.3 The Basic Structure of Networks

0.3.1 The Basic Jargon

First of all, what is a network? Plainly speaking, a network is a system of elements which are connected by some relationship. The vocabulary can be a bit technical and even inconsistent between different disciplines, packages, and software. The whole system is (surprise, surprise) usually called a network or graph. The elements are commonly referred to as nodes (system theory jargon) or vertices (graph theory jargon) of a graph, while the connections are edges or links. I will mostly refer to the elements as nodes, and their connections as edges.

Generally, networks are a form of representing relational data. This is a very general tool that can be applied to many different types of relationships between all kind of elements. The content, meaning, and interpretation for sure depends on what elements we display, and which types of relationships. For example:

  • In Social Network Analysis:
    • Nodes represent actors (which can be persons, firms and other socially constructed entities)
    • Edges represent relationships between this actors (friendship, interaction, co-affiliation, similarity ect.)
  • Other types of network
    • Chemistry: Interaction between molecules
    • Computer Science: The wirld-wide-web, inter- and intranet topologies
    • Biology: Food-web, ant-hives

The possibilities to depict relational data are manifold. For example:

  • Relations among persons
    • Kinship: mother of, wife of…
    • Other role based: boss of, supervisor of…
    • Cognitive/perceptual: knows, aware of what they know…
    • Affective: likes, trusts…
    • Interaction: give advice, talks to…
    • Affiliation: belong to same clubs, shares same interests…
  • Relations among organizations
    • As corporate entities
    • Buy from / sell to, leases to, outsources to
    • Owns shares of, subsidiary of
    • Joint ventures, strategic alliances
    • Via their members
      • Personnel flows
      • Interlocking directorates
      • Personal friendships
      • Co-memberships
  • Relations other (non-social) entities
    • Patents
      • Patents citing other patents
      • Co-occurrence of technological classes *Research fields
      • through citations
      • through people co-affiliated with fields *Sectors
      • input-output relations
      • Labor mobility *Technologies
      • Patent IPC classes
      • Semantic co-occurrence

Note: Content matters! Each relation yields a different structure & has different effects. Theories might make sense on inter-personal, but not inter-organizational or non-social context.

0.4 The Data-Structure of Relational Data

0.4.1 Edgelist

MOst real world relational data is to be found in what we call an edge list, a dataframe that contains a minimum of two columns, one column of nodes that are the source of a connection and another column of nodes that are the target of the connection. The nodes in the data are identified by unique IDs. If the distinction between source and target is meaningful, the network is directed. If the distinction is not meaningful, the network is undirected (more on that later). So, every row that contains the ID of one element in column 1, and the ID of another element in column 2 indicates that a connection between them exists. An edge list can also contain additional columns that describe attributes of the edges such as a magnitude aspect for an edge. If the edges have a magnitude attribute the graph is considered weighted (more on that later). Below an example ofa minimal edge list created with the tibble() function.

edge_list <- tibble(from = c(1, 2, 2, 3, 4), to = c(2, 3, 4, 2, 1))
edge_list

Sometimes it is preferable to also create a separate node list. At its simplest, a node list is a data frame with a single column - which I will label as “id” - that lists the node IDs found in the edge list. The advantage of creating a separate node list is the ability to add attribute columns to the data frame such as the names of the nodes or any kind of groupings.

node_list <- tibble(id = 1:4, group = sample(letters[1:2], 4, replace = TRUE))
node_list

0.4.2 Adjacency Matrix

A second popular form of network representation is the adjacency-matrix (also called socio-matrix). It is represented as a \(n*n\) matrix, where \(n\) stands for the number of elements of which their relationships should be represented. The value in the cell that intercepts row \(n\) and column \(m\) indicates if an edge is present (=1) or absent (=0).

Tip: Given an edgelist, an adjacency matrix can easily be produced by crosstabulating:

adj_matrix <- table(edge_list) %>% as.matrix()
adj_matrix
##     to
## from 1 2 3 4
##    1 0 1 0 0
##    2 0 0 1 1
##    3 0 1 0 0
##    4 1 0 0 0

0.5 Generating a Graph Object in tidygraph

So, now it is finally time to use tidygraph.

g <- edge_list %>% as_tbl_graph(directed = FALSE)
g
## # A tbl_graph: 4 nodes and 5 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 4 x 1 (active)
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## 4 4    
## #
## # Edge Data: 5 x 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     2     3
## 3     2     4
## # ... with 2 more rows
g %>% ggraph(layout = 'nicely') + 
  geom_edge_link() + 
  geom_node_point() + 
  geom_node_text(aes(label = name))

While being able to use the dplyr verbs on relational data is nice and all, one of the reasons we are dealing with graph data in the first place is because we need some graph-based algorithms for solving our problem at hand. If we need to break out of the tidy workflow every time this was needed we wouldn’t have gained much. Because of this tidygraph has wrapped more or less all of igraphs algorithms in different ways, ensuring a consistent syntax as well as output that fits into the tidy workflow. In the following we’re going to take a look at these.

Central to all of these functions is that they know about which graph is being computed on (in the same way that n() knows about which tibble is currently in scope). Furthermore they always return results matching the node or edge position so they can be used directly in mutate() calls.

0.6 Network effects & structures

One of the simplest concepts when computing graph based values is that of centrality, i.e. how central is a node or edge in the graph. As this definition is inherently vague, a lot of different centrality scores exists that all treat the concept of central a bit different. One of the famous ones is the pagerank algorithm that was powering Google Search in the beginning. tidygraph currently has 11 different centrality measures and all of these are prefixed with centrality_* for easy discoverability. All of them returns a numeric vector matching the nodes (or edges in the case of centrality_edge_betweenness()).

g <- play_smallworld(1, 100, 3, 0.05) %>% 
    mutate(centrality_dgr = centrality_degree(),
           centrality_eigen = centrality_eigen(),
           centrality_between = centrality_betweenness()) 
g %>%
    ggraph(layout = "kk") + 
    geom_edge_link() + 
    geom_node_point(aes(size = centrality_dgr, colour = centrality_dgr)) + 
    scale_color_continuous(guide = "legend") + 
    theme_graph()

g %>%
    ggraph(layout = "kk") + 
    geom_edge_link() + 
    geom_node_point(aes(size = centrality_eigen, colour = centrality_eigen)) + 
    scale_color_continuous(guide = "legend") + 
    theme_graph()

g %>%
    ggraph(layout = "kk") + 
    geom_edge_link() + 
    geom_node_point(aes(size = centrality_between, colour = centrality_between)) + 
    scale_color_continuous(guide = "legend") + 
    theme_graph()

0.7 Clustering (Community detection)

Another common operation is to group nodes based on the graph topology, sometimes referred to as community detection based on its commonality in social network analysis. All clustering algorithms from igraph is available in tidygraph using the group_* prefix. All of these functions return an integer vector with nodes (or edges) sharing the same integer being grouped together.

g <- play_islands(5, 10, 0.8, 3) %N>% 
    mutate(community = as.factor(group_louvain())) 


g %>% 
    ggraph(layout = 'kk') + 
    geom_edge_link(aes(alpha = ..index..), show.legend = FALSE) + 
    geom_node_point(aes(colour = community), size = 7) + 
    theme_graph()

1 Bibliographic mapping

1.1 Basics

Lets talk about bibliographic networks. In short, that are networks between documents which cite each others. That can be (commonly) academic publications, but also patents or policy reports. Conceptually, we can see them as 2 mode networks, between articles and their reference. That helps us to apply some interesting metrics, such as:

  • direct citations
  • Bibliographic coupling
  • Co–citations

Interestingly, different projections of this 2-mode network give the whole resulting 1-mode network a different meaning.

1.2 Fun with the bibliometrix package

Since lately, the bibliometrix package became exteremly good, and by now almost suitable to replace my hand-made workflows. So, I will spare you the data munging, and demonstrate how to use the nice inbuild functionalities here. By doing so, you will develop a lot of intuition on network projection, and aggregation on different levels.

library(bibliometrix)

1.2.1 Loading the data

So, lets load some data. Since it is the topic of this lecture series, why not do a bibliographic mapping of “Innovation system” and “innovation ecosystem”" literature. Here I use the web of science database on scientific literature. I here downloaded the following query.

  • Data source: Clarivate Analytics Web of Science (http://apps.webofknowledge.com)
  • Data format: bibtex
  • Query: TOPIC: (“innovation system” OR “systems of innovation” OR “innovation ecosystem”)
  • Timespan: the beginning of time - March 2019
  • Document Type: Articles
  • Language: English
  • Query data: March, 2019
  • Selection: 1000 most cited

We now just read the plain data with the inbuild convert2df() function

M <- readFiles("../input/wos_1.bib", "../input/wos_2.bib") %>%
  convert2df(dbsource = "isi",
             format = "bibtex")
## 
## Converting your isi collection into a bibliographic dataframe
## 
## Articles extracted   100 
## Articles extracted   200 
## Articles extracted   300 
## Articles extracted   400 
## Articles extracted   500 
## Articles extracted   600 
## Articles extracted   700 
## Articles extracted   800 
## Articles extracted   900 
## Articles extracted   1000 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
M %>% head()

1.2.2 Descriptive Analysis

Although bibliometrics is mainly known for quantifying the scientific production and measuring its quality and impact, it is also useful for displaying and analysing the intellectual, conceptual and social structures of research as well as their evolution and dynamical aspects.

In this way, bibliometrics aims to describe how specific disciplines, scientific domains, or research fields are structured and how they evolve over time. In other words, bibliometric methods help to map the science (so-called science mapping) and are very useful in the case of research synthesis, especially for the systematic ones.

Bibliometrics is an academic science founded on a set of statistical methods, which can be used to analyze scientific big data quantitatively and their evolution over time and discover information. Network structure is often used to model the interaction among authors, papers/documents/articles, references, keywords, etc.

Bibliometrix is an open-source software for automating the stages of data-analysis and data-visualization. After converting and uploading bibliographic data in R, Bibliometrix performs a descriptive analysis and different research-structure analysis.

Descriptive analysis provides some snapshots about the annual research development, the top “k” productive authors, papers, countries and most relevant keywords.

1.2.2.1 Main findings about the collection

results <- biblioAnalysis(M)
summary(results, 
        k = 20, 
        pause = F)
## 
## 
## Main Information about data
## 
##  Documents                             1000 
##  Sources (Journals, Books, etc.)       288 
##  Keywords Plus (ID)                    870 
##  Author's Keywords (DE)                1172 
##  Period                                1975 - 2018 
##  Average citations per documents       71.11 
## 
##  Authors                               1830 
##  Author Appearances                    2397 
##  Authors of single-authored documents  215 
##  Authors of multi-authored documents   1615 
##  Single-authored documents             251 
## 
##  Documents per Author                  0.546 
##  Authors per Document                  1.83 
##  Co-Authors per Documents              2.4 
##  Collaboration Index                   2.16 
##  
##  Document types                     
##  ARTICLE                         411 
##  ARTICLE, PROCEEDINGS PAPER      47 
##  EDITORIAL MATERIAL              5 
##  PROCEEDINGS PAPER               2 
##  REVIEW                          34 
##  REVIEW, BOOK CHAPTER            1 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     1975        1
##     1990        1
##     1991        1
##     1992        4
##     1993        2
##     1994        3
##     1995        7
##     1996        3
##     1997        5
##     1998       15
##     1999       12
##     2000       19
##     2001       26
##     2002       34
##     2003       32
##     2004       31
##     2005       35
##     2006       31
##     2007       46
##     2008       55
##     2009       74
##     2010       60
##     2011       90
##     2012       71
##     2013       75
##     2014       73
##     2015       84
##     2016       62
##     2017       33
##     2018       15
## 
## Annual Percentage Growth Rate 9.787999 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles   Authors        Articles Fractionalized
## 1   HEKKERT MP          25 HEKKERT MP                          8.62
## 2   KLERKX L            14 COOKE P                             5.00
## 3   COENEN L            11 LEYDESDORFF L                       5.00
## 4   TRUFFER B           11 KLERKX L                            4.55
## 5   LEYDESDORFF L       10 MOWERY DC                           4.50
## 6   JACOBSSON S          9 JACOBSSON S                         4.20
## 7   NEGRO SO             9 COENEN L                            4.08
## 8   COOKE P              8 WONGLIMPIYARAT J                    4.00
## 9   LEEUWIS C            8 TRUFFER B                           3.82
## 10  MARKARD J            8 CHEN SH                             3.50
## 11  DOLOREUX D           7 FREEMAN C                           3.50
## 12  ARCHIBUGI D          6 FRITSCH M                           3.50
## 13  GUAN J               6 HUNG SC                             3.50
## 14  HARMAAKORPI V        6 DOLOREUX D                          3.33
## 15  ISAKSEN A            6 TRIPPL M                            3.33
## 16  LEHRER M             6 HARMAAKORPI V                       3.17
## 17  TRIPPL M             6 LEHRER M                            3.17
## 18  BERGEK A             5 DIEZ JR                             3.08
## 19  BINZ C               5 KITAGAWA F                          3.00
## 20  DIEZ JR              5 MOTOHASHI K                         3.00
## 
## 
## Top manuscripts per citations
## 
##                                  Paper           TC TCperYear
## 1  GEELS FW, 2004, RES POLICY                   950      63.3
## 2  FREEMAN C, 1995, CAMBR J ECON                869      36.2
## 3  MALERBA F, 2002, RES POLICY                  840      49.4
## 4  COOKE P, 1997, RES POLICY                    836      38.0
## 5  HEKKERT MP, 2007, TECHNOL FORECAST SOC CHANG 708      59.0
## 6  BERGEK A, 2008, RES POLICY                   571      51.9
## 7  ASHEIM BT, 2005, RES POLICY                  554      39.6
## 8  PITTAWAY L, 2004, INT J MANAG REV            544      36.3
## 9  LUNDVALL BA, 2002, RES POLICY                488      28.7
## 10 MOULAERT F, 2003, REG STUD                   443      27.7
## 11 JACOBSSON S, 2000, ENERGY POLICY             398      20.9
## 12 MEYER-KRAHMER F, 1998, RES POLICY            390      18.6
## 13 MULLER E, 2001, RES POLICY                   375      20.8
## 14 COOKE P, 1992, GEOFORUM                      343      12.7
## 15 ADNER R, 2006, HARV BUS REV                  317      24.4
## 16 BUNNELL TG, 2001, PROG HUM GEOGR             293      16.3
## 17 LIU XL, 2001, RES POLICY                     292      16.2
## 18 CHRISTENSEN JF, 2005, RES POLICY             257      18.4
## 19 COLOMBO MG, 2002, RES POLICY                 246      14.5
## 20 CARAYANNIS EG, 2009, INT J TECHNOL MANAGE    245      24.5
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq SCP MCP MCP_Ratio
## 1  UNITED KINGDOM       76 0.1529  41  35     0.461
## 2  NETHERLANDS          54 0.1087  38  16     0.296
## 3  USA                  45 0.0905  32  13     0.289
## 4  GERMANY              41 0.0825  30  11     0.268
## 5  SWEDEN               36 0.0724  22  14     0.389
## 6  CHINA                25 0.0503  14  11     0.440
## 7  CANADA               23 0.0463  14   9     0.391
## 8  ITALY                21 0.0423  15   6     0.286
## 9  AUSTRIA              16 0.0322  13   3     0.188
## 10 FINLAND              15 0.0302  12   3     0.200
## 11 JAPAN                14 0.0282  11   3     0.214
## 12 SPAIN                14 0.0282   9   5     0.357
## 13 SWITZERLAND          13 0.0262   6   7     0.538
## 14 DENMARK              12 0.0241   9   3     0.250
## 15 FRANCE               12 0.0241   8   4     0.333
## 16 NORWAY               12 0.0241   9   3     0.250
## 17 TAIWAN               12 0.0241   9   3     0.250
## 18 KOREA                 9 0.0181   8   1     0.111
## 19 BELGIUM               6 0.0121   4   2     0.333
## 20 ISRAEL                5 0.0101   3   2     0.400
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##      Country      Total Citations Average Article Citations
## 1  UNITED KINGDOM            7033                      92.5
## 2  NETHERLANDS               4864                      90.1
## 3  SWEDEN                    3275                      91.0
## 4  USA                       2721                      60.5
## 5  GERMANY                   2712                      66.1
## 6  ITALY                     2319                     110.4
## 7  DENMARK                   1442                     120.2
## 8  CHINA                     1327                      53.1
## 9  CANADA                    1308                      56.9
## 10 FRANCE                    1185                      98.8
## 11 AUSTRIA                   1177                      73.6
## 12 SWITZERLAND                663                      51.0
## 13 NORWAY                     626                      52.2
## 14 JAPAN                      591                      42.2
## 15 FINLAND                    549                      36.6
## 16 TAIWAN                     523                      43.6
## 17 SPAIN                      467                      33.4
## 18 KOREA                      409                      45.4
## 19 BELGIUM                    389                      64.8
## 20 SINGAPORE                  339                     113.0
## 
## 
## Most Relevant Sources
## 
##                                       Sources        Articles
## 1  RESEARCH POLICY                                        125
## 2  TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE             70
## 3  EUROPEAN PLANNING STUDIES                               46
## 4  TECHNOVATION                                            39
## 5  ENERGY POLICY                                           33
## 6  SCIENTOMETRICS                                          31
## 7  TECHNOLOGY ANALYSIS \\& STRATEGIC MANAGEMENT            29
## 8  REGIONAL STUDIES                                        25
## 9  INTERNATIONAL JOURNAL OF TECHNOLOGY MANAGEMENT          22
## 10 JOURNAL OF CLEANER PRODUCTION                           20
## 11 SCIENCE AND PUBLIC POLICY                               19
## 12 JOURNAL OF TECHNOLOGY TRANSFER                          18
## 13 AGRICULTURAL SYSTEMS                                    16
## 14 RENEWABLE \\& SUSTAINABLE ENERGY REVIEWS                13
## 15 R \\& D MANAGEMENT                                      12
## 16 ENVIRONMENTAL INNOVATION AND SOCIETAL TRANSITIONS       10
## 17 INDUSTRY AND INNOVATION                                 10
## 18 INNOVATION-MANAGEMENT POLICY \\& PRACTICE               10
## 19 ENVIRONMENT AND PLANNING C-GOVERNMENT AND POLICY         9
## 20 WORLD DEVELOPMENT                                        9
## 
## 
## Most Relevant Keywords
## 
##          Author Keywords (DE)      Articles      Keywords-Plus (ID)     Articles
## 1  INNOVATION SYSTEM                     66 TECHNOLOGY                        66
## 2  INNOVATION                            54 INNOVATION                        65
## 3  REGIONAL INNOVATION SYSTEM            29 KNOWLEDGE                         58
## 4  INNOVATION SYSTEMS                    26 SYSTEMS                           54
## 5  INNOVATION POLICY                     24 NETWORKS                          53
## 6  CHINA                                 21 POLICY                            51
## 7  TECHNOLOGICAL INNOVATION SYSTEM       21 RESEARCH AND DEVELOPMENT          47
## 8  NATIONAL INNOVATION SYSTEM            16 FIRMS                             46
## 9  NATIONAL SYSTEMS OF INNOVATION        14 INDUSTRY                          43
## 10 REGIONAL INNOVATION SYSTEMS           13 PERSPECTIVE                       43
## 11 NETWORKS                              12 SCIENCE                           41
## 12 POLICY                                11 FRAMEWORK                         38
## 13 R D                                   11 MANAGEMENT                        32
## 14 SYSTEMS OF INNOVATION                 11 PERFORMANCE                       29
## 15 INSTITUTIONS                           9 DYNAMICS                          28
## 16 NATIONAL INNOVATION SYSTEMS            9 DIFFUSION                         27
## 17 DEVELOPMENT                            8 GROWTH                            27
## 18 INNOVATION NETWORKS                    8 RENEWABLE ENERGY TECHNOLOGY       23
## 19 LOCK IN                                8 EVOLUTION                         21
## 20 OPEN INNOVATION                        8 SPILLOVERS                        21
plot(results)

1.2.2.2 Most Cited References (internally)

CR <- citations(M, 
                field = "article", 
                sep = ";")
cbind(CR$Cited[1:10])
##                                                                                        [,1]
## LUNDVALL B.-A, 1992, NATL SYSTEMS INNOVAT.                                               90
## NELSON R, 1993, NATL INNOVATION SYST.                                                    78
## EDQUIST C., 1997, SYSTEMS INNOVATION T.                                                  65
## BERGEK A, 2008, RES POLICY, V37, P407, DOI 10.1016/J.RESPOL.2007.12.003.                 62
## FREEMAN C, 1987, TECHNOLOGY POLICY EC.                                                   60
## CARLSSON B, 1991, J EVOLUTIONARY EC, V1, P93, DOI DOI 10.1007/BF01224915.                58
## COHEN WM, 1990, ADMIN SCI QUART, V35, P128, DOI 10.2307/2393553.                         53
## HEKKERT MP, 2007, TECHNOL FORECAST SOC, V74, P413, DOI 10.1016/J.TECHFORE.2006.03.002.   50
## NELSON R, 1982, EVOLUTIONARY THEORY.                                                     46
## CARLSSON B, 2002, RES POLICY, V31, P233, DOI 10.1016/S0048-7333(01)00138-X.              42

1.2.3 Bibliographic Copling Analysis: The Knowledge Frontier of the Field

Bibliographic coupling is a newer technique, which has turned out to be very appropriate to capture a fields current knowledge frontier. I will show you how to do it here, but in case you are interested, read my paper :)

NetMatrix <- biblioNetwork(M, 
                           analysis = "coupling", 
                           network = "references", 
                           sep = ";")

net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Bibliographic Coupling Network", 
            type = "fruchterman", 
            size.cex = TRUE, 
            size = 20, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

1.2.4 Co-citation Analysis: The Intellectual Structure and Knowledge Bases of the field

Citation analysis is one of the main classic techniques in bibliometrics. It shows the structure of a specific field through the linkages between nodes (e.g. authors, papers, journal), while the edges can be differently interpretated depending on the network type, that are namely co-citation, direct citation, bibliographic coupling.

Below there are three examples.

  • First, a co-citation network that shows relations between cited-reference works (nodes).
  • Second, a co-citation network that uses cited-journals as unit of analysis. The useful dimensions to comment the co-citation networks are: (i) centrality and peripherality of nodes, (ii) their proximity and distance, (iii) strength of ties, (iv) clusters, (iiv) bridging contributions.
  • Third, a historiograph is built on direct citations. It draws the intellectual linkages in a historical order. Cited works of thousands of authors contained in a collection of published scientific articles is sufficient for recostructing the historiographic structure of the field, calling out the basic works in it.

1.2.4.1 Co-citation (cited references) analysis

Plot options:

  • n = 50 (the funxtion plots the main 50 cited references)
  • type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
  • size.cex = TRUE (the size of the vertices is proportional to their degree)
  • size = 20 (the max size of vertices)
  • remove.multiple=FALSE (multiple edges are not removed)
  • labelsize = 0.7 (defines the size of vertex labels)
  • edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
  • edges.min = 5 (plots only edges with a strength greater than or equal to 5)
  • all other arguments assume the default values
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-citation", 
                           network = "references", 
                           sep = ";")

net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Co-Citation Network", 
            type = "fruchterman", 
            size.cex = TRUE, 
            size = 20, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

1.2.4.2 Cited Journal (Source) co-citation analysis

M <- metaTagExtraction(M, "CR_SO", sep=";")

NetMatrix <- biblioNetwork(M, 
                           analysis = "co-citation", 
                           network = "sources", 
                           sep = ";")

net <-networkPlot(NetMatrix, 
            n = 50, 
            Title = "Co-Citation Network", 
            type = "auto", 
            size.cex = TRUE, 
            size = 15, 
            remove.multiple = FALSE, 
            labelsize = 0.7,
            edgesize = 10, 
            edges.min = 5)

by the way, the results contain an “hidden” igraph obejct. That is new, and makes further analysis of the results possible. Great!

str(net, max.level = 2)
## List of 5
##  $ graph      :List of 10
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..- attr(*, "class")= chr "igraph"
##  $ graph_pajek:List of 10
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..$ :List of 1
##   ..- attr(*, "class")= chr "igraph"
##  $ cluster_obj:List of 3
##   ..$ merges    : chr [1:20] "technol anal strateg" "technical change ec" "systems innovation t" "j evolutionary ec" ...
##   ..$ modularity: chr [1:19] "am econ rev" "acad manage rev" "j econ lit" "technovation" ...
##   ..$ membership: chr [1:11] "regional innovation." "european planning st" "oxford hdb innovatio" "j technology transfe" ...
##   ..- attr(*, "class")= chr "communities"
##  $ cluster_res:'data.frame': 50 obs. of  3 variables:
##   ..$ vertex        : Factor w/ 50 levels "acad manage rev",..: 45 43 41 24 37 38 10 16 14 48 ...
##   ..$ cluster       : num [1:50] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ btw_centrality: num [1:50] 0.1684 0.0178 0.0239 0.0806 1.2645 ...
##  $ layout     : num [1:50, 1:2] -0.265 -0.3487 -0.9116 -0.0946 0.0219 ...
net$graph
## IGRAPH b483ec8 UN-- 50 16723 -- 
## + attr: name (v/c), deg (v/n), size (v/n), label.cex (v/n), color (v/c), community (v/n), color (e/c), num
## | (e/n), width (e/n)
## + edges from b483ec8 (vertex names):
##  [1] technol anal strateg--technical change ec  technol anal strateg--technical change ec 
##  [3] technol anal strateg--technical change ec  technol anal strateg--technical change ec 
##  [5] technol anal strateg--technical change ec  technol anal strateg--technical change ec 
##  [7] technol anal strateg--technical change ec  technol anal strateg--technical change ec 
##  [9] technol anal strateg--technical change ec  technol anal strateg--systems innovation t
## [11] technol anal strateg--systems innovation t technol anal strateg--systems innovation t
## [13] technol anal strateg--systems innovation t technol anal strateg--systems innovation t
## + ... omitted several edges

Some summary statistics. I will only provide them here, but theur are availabel for all object created with biblioNetwork()

netstat <- networkStat(NetMatrix)
summary(netstat, k = 10)
## 
## 
## Main statistics about the network
## 
##  Size                                  7556 
##  Density                               0.011 
##  Transitivity                          0.175 
##  Diameter                              6 
##  Degree Centralization                 0.808 
##  Average path length                   2.155 
## 

1.2.4.3 Historiograph - Direct citation linkages

We can also look at a histograph of ciation pattern over time.

histResults <- histNetwork(M, 
                           min.citations = quantile(M$TC,0.75, na.rm = TRUE), 
                           sep = ";")
## Articles analysed   100 
## Articles analysed   127
net <- histPlot(histResults, 
                n = 20, 
                size.cex=TRUE, 
                size = 5, 
                labelsize = 3, 
                arrowsize = 0.5)

## 
##  Legend
## 
##                                        Paper                           DOI Year LCS GCS
## 1992 - 1             COOKE P, 1992, GEOFORUM  10.1016/0016-7185(92)90048-9 1992  12 343
## 1997 - 7           COOKE P, 1997, RES POLICY 10.1016/S0048-7333(97)00025-5 1997  41 836
## 1998 - 9   MEYER-KRAHMER F, 1998, RES POLICY 10.1016/S0048-7333(98)00094-8 1998   6 390
## 1999 - 15       EDQUIST C, 1999, TECHNOL SOC 10.1016/S0160-791X(98)00037-2 1999   7 127
## 2001 - 24           LIU XL, 2001, RES POLICY 10.1016/S0048-7333(00)00132-3 2001  14 292
## 2001 - 25       KAUFMANN A, 2001, RES POLICY 10.1016/S0048-7333(00)00118-9 2001   7 182
## 2002 - 31        MALERBA F, 2002, RES POLICY 10.1016/S0048-7333(01)00139-1 2002  28 840
## 2002 - 32      LUNDVALL BA, 2002, RES POLICY 10.1016/S0048-7333(01)00137-8 2002  19 488
## 2002 - 34        FREEMAN C, 2002, RES POLICY 10.1016/S0048-7333(01)00136-6 2002  10 219
## 2003 - 43         MOULAERT F, 2003, REG STUD   10.1080/0034340032000065442 2003   9 443
## 2004 - 49         GEELS FW, 2004, RES POLICY  10.1016/J.RESPOL.2004.01.015 2004  13 950
## 2005 - 57        ASHEIM BT, 2005, RES POLICY  10.1016/J.RESPOL.2005.03.013 2005  15 554
## 2005 - 59   IAMMARINO S, 2005, EUR PLAN STUD     10.1080/09645310500107084 2005   6 122
## 2006 - 67         SHARIF N, 2006, RES POLICY  10.1016/J.RESPOL.2006.04.001 2006   7 165
## 2006 - 70    LEYDESDORFF L, 2006, RES POLICY  10.1016/J.RESPOL.2006.09.027 2006   6  81
## 2008 - 81         BERGEK A, 2008, RES POLICY  10.1016/J.RESPOL.2007.12.003 2008  62 571
## 2008 - 86        KLERKX L, 2008, FOOD POLICY 10.1016/J.FOODPOL.2007.10.001 2008   6 126
## 2010 - 106      COENEN L, 2010, J CLEAN PROD 10.1016/J.JCLEPRO.2010.04.003 2010   8  95
## 2012 - 113        WEBER KM, 2012, RES POLICY  10.1016/J.RESPOL.2011.10.015 2012   8 177
## 2014 - 126          BINZ C, 2014, RES POLICY  10.1016/J.RESPOL.2013.07.002 2014   7  92

1.2.5 The conceptual structure and context - Co-Word Analysis

Co-word networks show the conceptual structure, that uncovers links between concepts through term co-occurences.

Conceptual structure is often used to understand the topics covered by scholars (so-called research front) and identify what are the most important and the most recent issues.

Dividing the whole timespan in different timeslices and comparing the conceptual structures is useful to analyze the evolution of topics over time.

Bibliometrix is able to analyze keywords, but also the terms in the articles’ titles and abstracts. It does it using network analysis or correspondance analysis (CA) or multiple correspondance analysis (MCA). CA and MCA visualise the conceptual structure in a two-dimensional plot.

We can even do way more fancy stuff with abstracts or full texts (and do so). However, I dont want to spoiler Romans sessions, so I will hold myself back here

1.2.5.1 Co-word Analysis through Keyword co-occurrences

Plot options:

  • normalize = “association” (the vertex similarities are normalized using association strength)
  • n = 50 (the function plots the main 50 cited references)
  • type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
  • size.cex = TRUE (the size of the vertices is proportional to their degree)
  • size = 20 (the max size of the vertices)
  • remove.multiple=FALSE (multiple edges are not removed)
  • labelsize = 3 (defines the max size of vertex labels)
  • label.cex = TRUE (The vertex label sizes are proportional to their degree)
  • edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
  • label.n = 30 (Labels are plotted only for the main 30 vertices)
  • edges.min = 25 (plots only edges with a strength greater than or equal to 2)
  • all other arguments assume the default values
NetMatrix <- biblioNetwork(M, 
                           analysis = "co-occurrences", 
                           network = "keywords", 
                           sep = ";")

net <- networkPlot(NetMatrix, 
                   normalize = "association", 
                   n = 50, 
                   Title = "Keyword Co-occurrences", 
                   type = "fruchterman", 
                   size.cex = TRUE, size = 20, remove.multiple = FALSE, 
                   edgesize = 10, 
                   labelsize = 3,
                   label.cex = TRUE,
                   label.n = 50,
                   edges.min = 2)

1.2.5.2 Co-word Analysis through Correspondence Analysis

You already saw that comming, right?

CS <- conceptualStructure(M, 
                          method = "CA", 
                          field = "ID", 
                          minDegree = 10, 
                          k.max = 8, 
                          stemming = FALSE, 
                          labelsize = 8,
                          documents = 20)

1.2.5.3 Thematic Map

Co-word analysis draws clusters of keywords. They are considered as themes, whose density and centrality can be used in classifying themes and mapping in a two-dimensional diagram.

Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed: (1) upper-right quadrant: motor-themes; (2) lower-right quadrant: basic themes; (3) lower-left quadrant: emerging or disappearing themes; (4) upper-left quadrant: very specialized/niche themes.

Please see Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146-166.

NetMatrix <- biblioNetwork(M, 
                           analysis = "co-occurrences",
                           network = "keywords", 
                           sep = ";")

S <- normalizeSimilarity(NetMatrix, 
                         type = "association")

net <- networkPlot(S,
                   n = 500, 
                   Title = "Keyword co-occurrences",
                   type = "fruchterman",
                   labelsize = 2, 
                   halo = FALSE,
                   cluster = "walktrap", 
                   remove.isolates = FALSE,
                   remove.multiple = FALSE, 
                   noloops = TRUE, 
                   weighted = TRUE,
                   label.cex = TRUE,
                   edgesize = 5, 
                   size = 1,
                   edges.min = 2)

Map <- thematicMap(M,
                   minfreq =5 )
plot(Map$map)

Lets inspect the clusters we found:

clusters <-Map$words %>%
  arrange(Cluster, desc(Occurrences))

clusters %>%
  select(Cluster, Words, Occurrences) %>%
  group_by(Cluster) %>%
  mutate(n.rel = Occurrences / sum(Occurrences) ) %>%
  slice(1:3)

1.2.6 The social structure - Collaboration Analysis

Collaboration networks show how authors, institutions (e.g. universities or departments) and countries relate to others in a specific field of research. For example, the first figure below is a co-author network. It discovers regular study groups, hidden groups of scholars, and pivotal authors. The second figure is called “Edu collaboration network” and uncovers relevant institutions in a specific research field and their relations.

1.2.6.1 Author collaboration network

NetMatrix <- biblioNetwork(M %>% filter(!grepl("GESCHWIND", AU)), 
                           analysis = "collaboration",  
                           network = "authors", 
                           sep = ";")

S <- normalizeSimilarity(NetMatrix, type = "jaccard")

net <- networkPlot(S,  
                   n = 50, 
                   Title = "Author collaboration",
                   type = "auto", 
                   size = 10,
                   weighted = TRUE,
                   remove.isolates = TRUE,
                   size.cex = TRUE,
                   edgesize = 1,
                   labelsize = 0.6)

1.2.6.2 Edu collaboration network

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "universities", 
                           sep = ";")

net <- networkPlot(NetMatrix,  
                   n = 50, 
                   Title = "Edu collaboration",
                   type = "auto", 
                   size = 10,
                   size.cex = T,
                   edgesize = 3,
                   labelsize = 0.6)

1.2.6.3 Country collaboration network

M <- metaTagExtraction(M, 
                       Field = "AU_CO", 
                       sep = ";")

NetMatrix <- biblioNetwork(M, 
                           analysis = "collaboration",  
                           network = "countries", 
                           sep = ";")

net <- networkPlot(NetMatrix,  
                   n = dim(NetMatrix)[1], 
                   Title = "Country collaboration",
                   type = "sphere", 
                   cluster = "lovain",
                   weighted = TRUE,
                   size = 10,
                   size.cex = T,
                   edgesize = 1,
                   labelsize = 0.6)
## 
## Unknown cluster argument. Using default algorithm

Isn’t that all a lot of fun?

By now you should have realized that different leevel of projection and aggregation offer almost endless possibilities for analysis of ibliographic data!

By the way: We can also do all of that with tidygraph and ggraph

g <- NetMatrix %>% as.matrix() %>% as_tbl_graph(directed = FALSE)
g
## # A tbl_graph: 47 nodes and 174 edges
## #
## # An undirected multigraph with 5 components
## #
## # Node Data: 47 x 1 (active)
##   name          
##   <chr>         
## 1 NETHERLANDS   
## 2 ITALY         
## 3 SPAIN         
## 4 UNITED KINGDOM
## 5 GERMANY       
## 6 SWEDEN        
## # ... with 41 more rows
## #
## # Edge Data: 174 x 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     1     80
## 2     1     2      1
## 3     1     4      9
## # ... with 171 more rows
g <- g %N>%
    mutate(community = as.factor(group_louvain(weights = weight))) 
g %N>%
  mutate(dgr = centrality_degree(weights = weight)) %>%
  arrange(desc(dgr)) %>%
  slice(1:200) %>%
  ggraph(layout = 'fr') + 
  geom_edge_link(aes(width = weight), alpha = 0.2, colour = "grey") + 
  geom_node_point(aes(colour = community, size = dgr)) + 
  geom_node_text(aes(label = name), size = 1, repel = FALSE) +
  theme_graph()